Knowledge-driven geospatial location resolution for phylogeographic models of virus migration
نویسندگان
چکیده
UNLABELLED Diseases caused by zoonotic viruses (viruses transmittable between humans and animals) are a major threat to public health throughout the world. By studying virus migration and mutation patterns, the field of phylogeography provides a valuable tool for improving their surveillance. A key component in phylogeographic analysis of zoonotic viruses involves identifying the specific locations of relevant viral sequences. This is usually accomplished by querying public databases such as GenBank and examining the geospatial metadata in the record. When sufficient detail is not available, a logical next step is for the researcher to conduct a manual survey of the corresponding published articles. MOTIVATION In this article, we present a system for detection and disambiguation of locations (toponym resolution) in full-text articles to automate the retrieval of sufficient metadata. Our system has been tested on a manually annotated corpus of journal articles related to phylogeography using integrated heuristics for location disambiguation including a distance heuristic, a population heuristic and a novel heuristic utilizing knowledge obtained from GenBank metadata (i.e. a 'metadata heuristic'). RESULTS For detecting and disambiguating locations, our system performed best using the metadata heuristic (0.54 Precision, 0.89 Recall and 0.68 F-score). Precision reaches 0.88 when examining only the disambiguation of location names. Our error analysis showed that a noticeable increase in the accuracy of toponym resolution is possible by improving the geospatial location detection. By improving these fundamental automated tasks, our system can be a useful resource to phylogeographers that rely on geospatial metadata of GenBank sequences. .
منابع مشابه
Pleistocene climatic fluctuations drive isolation and secondary contact in the red diamond rattlesnake (Crotalus ruber) in Baja California
Editor: Brent Emerson Abstract Aim: Many studies have investigated the phylogeographic history of species on the Baja California Peninsula, and they often show one or more genetic breaks that are spatially concordant among many taxa. These phylogeographic breaks are commonly attributed to vicariance as a result of geological or climatic changes, followed by secondary contact when barriers are n...
متن کاملOntology-driven Automatic Geospatial-Processing Modeling based on Web-service Chaining
Earth System Science (ESS) research and applications often involve in collecting, analyzing and modeling with distributed heterogeneous geospatial data. Those data are processed step-by-step in geospatial analysis systems to extract information and knowledge products for applications and decision makings. Conceptually, such a step-by-step process forms a geospatial processing model that represe...
متن کاملNetwork Location and Risk of Human Immunodeficiency Virus Transmission among Injecting Drug Users: Results of Multiple Membership Multilevel Modeling of Social Networks
Background: Despite the implementation of harm reduction program, some injecting drug users (IDU) continue to engage in high-risk behaviors. It seems that there are some social factors that contribute to risk of human immunodeficiency virus (HIV) transmission in IDUs. The aim of this study was to analysis the social network of IDUs and examines the effect of network location on HIV transmission...
متن کاملA Framework for Developing Web-Service-Based Intelligent Geospatial Knowledge Systems
1082-4006/05/1101-24$5.00 ©2005 The International Association of Chinese Professionals in Geographic Information Science (CPGIS) Abstract This paper discusses an interoperable system framework for developing web-service-based intelligent geospatial knowledge systems. This type of systems facilitates personalized, on-demand geospatial information or knowledge discovery and dissemination. The sys...
متن کاملThree roads diverged? Routes to phylogeographic inference.
Phylogeographic methods facilitate inference of the geographical history of genetic lineages. Recent examples explore human migration and the origins of viral pandemics. There is longstanding disagreement over the use and validity of certain phylogeographic inference methodologies. In this paper, we highlight three distinct frameworks for phylogeographic inference to give a taste of this disagr...
متن کامل